Accelerating hierarchical acoustic likelihood computation on graphics processors
نویسندگان
چکیده
The paper presents a method for performance improvements of a speech recognition system by moving a part of the computation acoustic likelihood computation onto a Graphics Processor Unit (GPU). In the system, GPU operates as a low cost powerful coprocessor for linear algebra operations. The paper compares GPU implementation of two techniques of acoustic likelihood computation: full Gaussian computation of all components and a significantly faster Gaussian selection method using hierarchical evaluation. The full Gaussian computation is an ideal candidate for GPU implementation because of its matrix multiplication nature. The hierarchical Gaussian computation is a technique commonly used on a CPU since it leads to much better performance by pruning the computation volume. Pruning techniques are generally much harder to implement on GPUs, nevertheless, the paper shows that hierarchical Gaussian computation can still be done using a GPU with much better performance than the full Gaussian computation.
منابع مشابه
GPU accelerated acoustic likelihood computations
This paper introduces the use of Graphics Processors Unit (GPU) for computing acoustic likelihoods in a speech recognition system. In addition to their high availability, GPUs provide high computing performance at low cost. We have used a NVidia GeForce 8800GTX programmed with the CUDA (Compute Unified Device Architecture) which shows the GPU as a parallel coprocessor. The acoustic likelihoods ...
متن کاملAccelerating Signal Processing Algorithms Using Graphics Processors
There is increased interest in the use of graphics processing units (GPUs) for general purpose computation. This is because GPUs are almost two orders of magnitude faster in terms of floating point throughput compared to conventional CPUs. In this paper we investigate the use of graphics processing units for accelerating signal processing algorithms, specifically FIR filters and the FFT. We des...
متن کاملShell: A Spatial Decomposition Data Structure for 3D Curve Traversal on Many-Core Architectures
Shared memory many-core processors such as GPUs have been extensively used in accelerating computation-intensive algorithms and applications. When porting existing algorithms from sequential or other parallel architecture models to shared memory many-core architectures, non-trivial modifications are often needed in order to match the execution patterns of the target algorithms with the characte...
متن کاملAccelerating Parallel Frequent Itemset Mining on Graphics Processors with Sorting
Frequent Itemset Mining (FIM) is one of the most investigated fields of data mining. The goal of Frequent Itemset Mining (FIM) is to find the most frequently-occurring subsets from the transactions within a database. Many methods have been proposed to solve this problem, and the Apriori algorithm is one of the best known methods for frequent Itemset mining (FIM) in a transactional database. In ...
متن کاملMPI- and CUDA- implementations of modal finite difference method for P-SV wave propagation modeling
Among different discretization approaches, Finite Difference Method (FDM) is widely used for acoustic and elastic full-wave form modeling. An inevitable deficit of the technique, however, is its sever requirement to computational resources. A promising solution is parallelization, where the problem is broken into several segments, and the calculations are distributed over different processors. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010